Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow restricting Kube metadata to local node only #1440

Merged
merged 10 commits into from
Dec 13, 2024

Conversation

mariomac
Copy link
Contributor

@mariomac mariomac commented Dec 10, 2024

Adds the BEYLA_KUBE_META_RESTRICT_LOCAL_NODE configuration option that allows configuring the local informer to only watch the Kubernetes Pods from the local node. This will alleviate the memory load, especially during startup.

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 19 lines in your changes missing coverage. Please review.

Project coverage is 81.14%. Comparing base (0a0bb6f) to head (60bb50e).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/kubecache/meta/informers_init.go 82.81% 6 Missing and 5 partials ⚠️
pkg/internal/kube/informer_provider.go 27.27% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1440      +/-   ##
==========================================
+ Coverage   80.97%   81.14%   +0.17%     
==========================================
  Files         149      149              
  Lines       15255    15309      +54     
==========================================
+ Hits        12353    12423      +70     
+ Misses       2293     2274      -19     
- Partials      609      612       +3     
Flag Coverage Δ
integration-test 59.59% <1.31%> (+0.08%) ⬆️
k8s-integration-test 60.70% <75.00%> (+0.32%) ⬆️
oats-test 33.79% <1.31%> (-0.12%) ⬇️
unittests 51.90% <34.21%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

Hello @mariomac!
Backport pull requests need to be either:

  • Pull requests which address bugs,
  • Urgent fixes which need product approval, in order to get merged,
  • Docs changes.

Please, if the current pull request addresses a bug fix, label it with the type/bug label.
If it already has the product approval, please add the product-approved label. For docs changes, please add the type/docs label.
If the pull request modifies CI behaviour, please add the type/ci label.
If none of the above applies, please consider removing the backport label and target the next major/minor release.
Thanks!

Copy link
Contributor

This PR must be merged before a backport PR will be created.

Copy link
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think the test failure might be related to aggressive expiration of metrics, or maybe confusion of what the name should be. I see an expiration of these values, where we have testserver-....

2024-12-12T16:40:35.912684579Z stdout F time=2024-12-12T16:40:35.912Z level=DEBUG msg="storing new metric label set" component=otel.Expirer type=*metric.int64Inst labelValues="[172.18.0.2 43558 request 10.244.0.5 10.244.0.0/16 testserver-858cdf668b-xpnlx 8080  egress my-kube testserver-858cdf668b-xpnlx default 172.18.0.2 test-kind-cluster-netolly-control-plane testserver Deployment Pod internal-pinger-net default 172.18.0.2 test-kind-cluster-netolly-control-plane internal-pinger-net Pod Pod 8080 10.244.0.9 10.244.0.0/16 internal-pinger-net TCP]"

However the next label is recorded with testserver as a name, i.e. no dash something:

2024-12-12T16:41:02.912577238Z stdout F time=2024-12-12T16:41:02.912Z level=DEBUG msg="storing new metric label set" component=otel.Expirer type=*metric.int64Inst labelValues="[172.18.0.2 43558 request 10.96.94.38 10.96.0.0/16 testserver 8080  egress my-kube testserver default   testserver Service Service internal-pinger-net default 172.18.0.2 test-kind-cluster-netolly-control-plane internal-pinger-net Pod Pod 8080 10.244.0.9 10.244.0.0/16 internal-pinger-net TCP]"

All subsequent labels are without the dash...

@@ -470,6 +521,10 @@ func (inf *Informers) ipInfoEventHandler(ctx context.Context) *cache.ResourceEve
return &cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
metrics.InformerNew()
em := obj.(*indexableEntity).EncodedMeta
for _, ip := range em.Ips {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to remove this debug print :)

@mariomac
Copy link
Contributor Author

mariomac commented Dec 13, 2024

Good catch @grcevski ! The duplicity of messages might be because testservice is captured either via a Pod (name with suffix) and a Service (name without suffix), but I'll anyway check that the IPs do not collide and that the expiration is properly set.

EDIT: you were right! But the expiration was not too early in Beyla, as I initually understood, but in the Prometheus TSDB.

@mariomac mariomac merged commit 3b31c2e into grafana:main Dec 13, 2024
15 checks passed
@mariomac mariomac deleted the kube-meta-local branch December 13, 2024 11:55
Copy link
Contributor

The backport to release-1.9 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-1440-to-release-1.9 origin/release-1.9
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 3b31c2e0be030243e4f8ca73a61fcefbb0387ecd

When the conflicts are resolved, stage and commit the changes:

git add . && git cherry-pick --continue

If you have the GitHub CLI installed:

# Push the branch to GitHub:
git push --set-upstream origin backport-1440-to-release-1.9
# Create the PR body template
PR_BODY=$(gh pr view 1440 --json body --template 'Backport 3b31c2e0be030243e4f8ca73a61fcefbb0387ecd from #1440{{ "\n\n---\n\n" }}{{ index . "body" }}')
# Create the PR on GitHub
echo "${PR_BODY}" | gh pr create --title '[release-1.9] allow restricting Kube metadata to local node only' --body-file - --label 'product-approved' --label 'backport' --base release-1.9 --milestone release-1.9 --web

Or, if you don't have the GitHub CLI installed (we recommend you install it!):

# Push the branch to GitHub:
git push --set-upstream origin backport-1440-to-release-1.9

# Create a pull request where the `base` branch is `release-1.9` and the `compare`/`head` branch is `backport-1440-to-release-1.9`.

# Remove the local backport branch
git switch main
git branch -D backport-1440-to-release-1.9

mariomac added a commit that referenced this pull request Dec 13, 2024
* allow restricting Kube metadata to local node only

* integration tests

* increase number of cores for test runners

* document new option

* added extra logging

* increased prometheus TSDB retention

* restore lower git action machines

(cherry picked from commit 3b31c2e)
mariomac added a commit that referenced this pull request Dec 13, 2024
* allow restricting Kube metadata to local node only

* integration tests

* increase number of cores for test runners

* document new option

* added extra logging

* increased prometheus TSDB retention

* restore lower git action machines

(cherry picked from commit 3b31c2e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants